Skip to main content

fastdup By Visual-Layer

fastdup is a tool for gaining insights from large image/video collections. It can find anomalies, duplicate and near duplicate images/videos, clusters of similarity, learn the normal behavior and temporal interactions between images/videos. It can be used for smart subsampling of a higher quality dataset, outlier removal, and novelty detection for new information to be sent for tagging. Just 2 lines of code to get you started:
todo: fastdup clip
fastdup is: Unsupervised: fits any dataset
Scalable : handles 400M images on a single machine
Efficient: works on CPU only
Low Cost: can process 12M images on a $1 cloud machine budget works on CPU only From the authors of GraphLab and Turi Create.

Quick installation

  • Python 3.7, 3.8, 3.9
  • Supported OS: Ubuntu 20.04, Ubuntu 18.04, Debian 10, Mac OSX M1, Mac OSX Intel, Windows 10 Server.
# upgrade pip to its latest version
python3.XX -m pip install -U pip
# install fastdup
python3.XX -m pip install fastdup
Where XX is your python version.
For Windows, CentOS 7.X, RedHat 4.8 and other older Linux see our Insallation instructions.

What’s new in V1.0?

  • Better support for labels
  • Better galleries
  • A new Python API
from fastdup.engine import Fastdup

fd = Fastdup()
fd.run()

# Use .summary() to get a quick overview of your data:
fd.summary()

# Now you have access to all analysis and galleries using the Fastdup object:
similarity_df = fd.similarity()
outliers_df = fd.outliers()

Running the code

Existing API is fully supported
import fastdup
fastdup.run(input_dir="/path/to/your/folder", work_dir='out', nearest_neighbors_k=5, turi_param='ccthreshold=0.96')    #main running function.
fastdup.create_duplicates_gallery('out/similarity.csv', save_path='.')     #create a visual gallery of found duplicates
fastdup.create_outliers_gallery('out/outliers.csv',   save_path='.')       #create a visual gallery of anomalies
fastdup.create_components_gallery('out', save_path='.')                    #create visualiaiton of connected components
fastdup.create_stats_gallery('out', save_path='.', metric='blur')          #create visualization of images stastics (for example blur)
fastdup.create_similarity_gallery('out', save_path='.',get_label_func=lambda x: x.split('/')[-2])     #create visualization of top_k similar images assuming data have labels which are in the folder name
fastdup.create_aspect_ratio_gallery('out', save_path='.')                  #create aspect ratio gallery

Getting started examples

Detailed instructions

User community contributions

Stroke AIS Data
Tire Data
Butterfly Mimics
Drugs and Vitamins
Plastic Bottles
Micro Organisms
PCB Boards
ZebraFish
Whats the difference

Support and feature requests

Join our Slack channel

fastdup enterprise edition

Visual Layer

About us

Danny Bickson, Amir Alush